A moving-window bayesian network model for assessing systemic risk in financial markets

Systemic risk refers to the uncertainty that arises due to the breakdown of a financial system. The concept of “too connected to fail” suggests that network connectedness plays an important role in measuring systemic risk. In this paper, we first recover a time series of Bayesian networks for stock returns, which allow the direction of links among stock returns to be formed with Markov properties in directed graphs. We rank the stocks in the time series of Bayesian networks based on the topological orders of the stocks in the learned Bayesian networks and develop an order distance, a new measure with which to assess the changes in the topological orders of the stocks. In an empirical study using stock data from the Hang Seng Index in Hong Kong and the Dow Jones Industrial Average, we use the order distance to predict the extreme absolute return, which is a proxy of extreme market risks, or a signal of systemic risks, using the LASSO regression model. Our results indicate that the network statistics of the time series of Bayesian networks and the order distance substantially improve the predictability of extreme absolute returns and provide insights into the assessment of systemic risk.


Introduction
Systemic risk refers to a breakdown of a financial system, which leads to a severe economic downturn, caused by connections within the financial system or natural disasters. Systemic risk is extremely harmful, since it may trigger cascading failures and losses [1,2]. Cascading failure refers to a process in which the failure of a few parts in an interconnected system triggers the failure of the other parts. If a financial system is tightly connected, the failure of a small part can eventually lead to a huge turnover of the whole system. This leads to the concept of "too connected to fail" [3]. Several works have documented higher level of network connectedness during the financial crises [4,5]. Examples include the financial crisis of 2007-2008 due to the default of an investment bank and the close links between the financial systems [2,6], the Chinese stock market turbulence in 2015 due to the inflation of the stock market bubble [7,8], and the sharply increased systemic risk during the COVID-19 pandemic [9]. Various approaches to measuring systemic risks have been studied, including the marginal expected shortfall [10] and the conditional value-at-risk [11]. These measures take correlations into account, whereas a correlation alone is not adequate for capturing connectedness, since it does not consider the interactions or the systemic importance in a financial system [12]. Network analysis is a much more comprehensive way of evaluating connectedness than value-at-risk. A network refers to a graph that represents a relationship using connections among a group of objects, and a financial network is a network with stocks as the objects. The group of objects and the connections are called nodes and edges, respectively, in the context of network analysis. By analyzing the high level of network connectedness in a financial network [13], we may be alerted before the market disturbances spread, which leads to a high level of volatility in a market. Volatility can be considered as a trigger or the cause of a crisis, and volatility can serve as a kind of precautionary signal for systemic risks. The best we can do is to avoid excessive volatility [14]. Throughout the paper, we aim to develop a measure to improve the prediction performance of the extreme financial risk, referred to as the order distance. We also propose the use of a rolling-window method in regard to learning a time series of Bayesian networks, in order to learn changes in the topological properties of the financial market, especially the order distance. We showcase how the topological features of time series of Bayesian networks can help assess financial risk using two sets of data from the Hang Seng Index (HSI) and the Dow Jones Industrial Average (DJIA) Index.
Rolling-window methods have been applied to financial risk management in previous studies. For example, [15] uses a rolling-window method to learn a time series of undirected networks for a stochastic actor-oriented model; [16] proposes a textual modeling method using a rolling-window scheme in a regression model, to predict a market volatility proxy based on the GARCH residuals; [17] studies the effect of the COVID-19 pandemic on financial market connectedness and systemic risk using rolling-window Granger-causality tests; and [18] uses a rolling-window method and a combined method with the topic modeling and network analysis to assess systemic risk in financial markets. Network analysis has been applied to risk assessment [19], pandemic risk management [20][21][22] and financial risk management [13,18,23,24]. To quantify extraordinary financial risk due to financial connectedness or financial contagion, a common application of network analysis involves systemic risk modeling [25][26][27]. There are recent examples measuring financial companies' contributions to systemic risk through the financial networks built using the tail risk of asset returns [28], conducted in order to study the impacts of the COVID-19 pandemic on the financial networks, based on partial correlations of the stock returns [13]; to detect early signals of financial contagion from financial networks, based on partial correlations of stock returns, and pandemic networks, based on the daily newly confirmed cases [17]; and to assess systemic risk in financial networks in terms of dynamic topic networks, based on topic similarities in the news [18]. The above studies are categorized as undirected graph applications, where edges in their networks have no direction. In this paper, we consider the use of a rolling-window Bayesian network model for assessing systemic risks in financial markets.
A Bayesian network is a popular probabilistic graphical model that represents a set of variables and their conditional dependence using a directed acyclic graph (DAG). A DAG is a graph containing only directed connections; it does not contain a cycle. The nodes in any DAG can be ranked in a topological order [29]; that is, we can tell which nodes are more important from the DAG. This feature effectively fits systemic modeling, in that we can measure the relative importance of the stocks, such that the stocks ranked at the front of the topological order can be associated with the sources of the market disturbances, and the topological order itself can be used as a list of directions regarding the flow of the market disturbances. Bayesian networks are often used to represent causal relationships; given an event occurring, we can calculate the probability of the consequences. Bayesian networks have been used in many real-world applications, including diagnosis and forecasting [30,31], and constitute an effective modeling approach in risk management [32], such as in modeling systemic risk using credit default swap data [33]; in detecting variables that affect the buy-sell signal for the S&P 500 index, including price-to-earnings ratio and price-to-sales ratio [34]; in credit risk modeling [35]; in modeling operational risk [36]; and in using DBN to generate early warning signals in systemic banking crises [37].
Note that, although we analyze a time series of Bayesian networks, our proposed methodology does not involve standard dynamic Bayesian network modeling. A dynamic Bayesian network models the dependency of variables over adjacent time steps. For example, the nodes of a sequences of Bayesian networks are assumed to follow homogeneous first-order Markov processes [38]; that is, the nodes at time t are dependent only on the nodes at time t−1. Another example is a framework specifically modeling the evolution of the dependencies of a sequence of Bayesian networks [39]. We do not model the evolution of the dependencies, but we do try to extract information from a time series of Bayesian network using a rolling-window method.
We propose the order distance, an indicator that measures the change in the topological orders of the Bayesian networks, which are learned using the rolling-window method, on two consecutive days, using MCMC structural learning algorithms [40,41]. We aim to use the order distance to improve the volatility prediction for the assessment of systemic risk in a financial market. We provide empirical evidence that a high level of change in the topological order is often accompanied by a high volatility. The empirical evidence is discussed in the section on order distance, and the detailed results are in S3 Appendix in S1 File. The order distance is used as predictor in a LASSO prediction model. Another indicator, the modified network density, is also included in the predictive model, treated as a predictor or a control variable. The Granger-causality tests are significant in regard to quite large portions of trading days, supporting the idea that the two risk indicators are useful in predicting the absolute returns. We illustrate the use of DBN and the topological order to help predict extreme market changes, and thus provide insights into assessing systemic risk. We will show that the two indicators are useful for improving the prediction performance of the volatility using a LASSO regression prediction model. It supports that the order distance contains useful information for predicting volatility. Also note that, the proposed order distance is not restricted to being used as a predictor in our LASSO prediction model, but it could be applied to any other volatility model as a predictor.
Specifically, an overview of the methodology of this study is shown in Fig 1. In this paper, we use the financial returns of 46 to 60 constituent stocks in the Hang Seng Index of Hong Kong from 2008 to 2021 (3,404 trading days) to estimate daily Bayesian network structures using a rolling-window method. The numbers of the HSI's constituent stocks were increased from 46 to 60 over the study period. The returns were discretized into dichotomous variables (0: downward movement; 1: upward movement) to make modeling more straightforward. We combine several structural learning algorithms to enhance the computational efficiency. The topological order and modified network density are then obtained from daily DBNs as risk indicators, measuring topological features derived from the inherent Markov dependence in Bayesian networks on each trading day. We expect that these topological features to provide useful information on market risks, especially under adverse and extreme market conditions. These two risk indicators were adopted to predict a proxy of the volatility, the absolute return of the HSI. Using the absolute return as a proxy of volatility is a well-established method [42][43][44]. This proxy has been widely used as a measure of market risk in the literature [16,45]. We also use the loss, defined as the negative part of the return of the HSI, as a proxy of market losses. To investigate the effectiveness of the two risk indicators, Granger-causality tests were conducted using the absolute return and the loss as responses to see when the risk indicators are useful to predict the absolute return or the loss. The predictive performance was also reported, with particular attention paid to tail events. The above methodology is also applied to the Dow Jones Industrial Average Index in order to test the predictive performance of the order distance in different stock markets.
The Methodology Section introduces the concepts related to Bayesian networks, including the network statistics, network density, and topological orders. The details of the proposed order distance are also included in the Methodology Section. The Rolling-window Bayesian Network Modeling Section contains the details of the data used in the empirical study in this paper, the procedures involved in the structural learning of a time series of Bayesian networks, and the details of the methods used to evaluate the usefulness of the order distance (including

PLOS ONE
the Granger-causality tests and a LASSO regression prediction model). The Main Results Section provides an illustration of the learned financial networks and their summary statistics, the Granger-causality tests results, and the prediction performances of the rolling-window prediction study. The Discussion Section explains the results in the Main Results Section and their implications. A discussion of the choice of the parameters is also included. A conclusion is also provided in the Discussion Section.

Bayesian networks
In this paper, we adopt a moving-window scheme in regard to Bayesian networks for modeling the dependence of financial returns over time. A Bayesian network (BN) is a graphical model that can be used to specifying the joint probability distributions of a set of variables [46]. More specifically, BNs can represent conditional dependence among variables using a directed acyclic graph (i.e., a directed graph that does not contain any cycles). The graphical representation of a BN is called a network, a graph, or a structure. A variable in a BN is called a node, and a directed connection to or from a node is called an arc or a directed edge. Let X 1 , . . ., X n be n variables in a BN. We say that an edge X i ! X j is in the BN if there exists an edge linking X i to X j in the BN. When we have the edge X i ! X j in the BN, X i is a said to be a parent of X j and X j is said to be a child of X i . When there is a path X i ! . . . ! X j linking X i to X j by a sequence of edges, X j is said to be a descendant of X i . By definition, a child of a node X i is also a descendant of X i . The parent set of a variable X i is denoted by pa(X i ), which contains all parents of X i . A network is said to contain a cycle if there exists a node X and a sequence of nodes The structure of any BN has to be a DAG, meaning that a BN does not contain any cycles. A BN satisfies the Markov assumption, which states that conditional on the parent set, a node is independent of other variables except its descendants. From the Markov assumption, the joint density function of the BN of n variables X 1 , X 2 , . . ., X n can be factorized into Fig 2 example gives an example of a BN consisting four variables: X 1 , X 2 , X 3 , and X 4 . The arrows between variables represent their edges, and the edges specify the dependence structures among the variables. For example, X 4 depends on both X 3 and X 1 , and X 1 and X 2 are the root nodes (i.e., nodes with no incoming edge). The joint probability distribution of the four variables can be factorized based on the dependence structure specified in Fig 2 example: The factorization in Eq (1) greatly simplifies the complex connectedness among variables and forms a score for the BN structural learning [29].

Network statistics
In general, Bayesian networks can be time-varying. In the discussion below, we denote a BN at time t by G t , which is defined by its set of nodes and its set of edges at time t. To efficiently summarize the statistical properties of G t , the following two summary statistics are used. The first one is the network density, and the second one is the topological order of the variables. Network density. The network density measures the connectedness of a network. Intuitively, the more connections or more edges a BN has, the higher the network density will be. The network density at time t is defined as where v t and e t are, respectively, the number of nodes and the number of edges of the network G t . It is the ratio of the number of edges in G t to the maximum possible number of edges, v t 2 À � ¼ v t ðv t À 1Þ=2, implying that NDðG t Þ lies between 0 and 1. As edges in a BN represent directed links among variables, NDðG t Þ, which gives the proportion of edges in the BN, quantifies the complexity of the BN to represent hidden conditional dependence structures among variables. In the structural learning of Bayesian networks, we often impose a restriction on the maximum number of parents for each node for the sake of computational efficiency. Under this restriction, the maximum possible number of edges is much smaller than v t 2 À � . Suppose

PLOS ONE
that the maximum number of parents of each node is M (M � v t ). It can be shown that with the restriction, the maximum possible number of edges in a BN is v t M−M(M + 1)/2. More details of this maximum number of edges are provided in the S2 Appendix in S1 File. We define the modified network density with the maximum parent size M as The network density in Eq (4) extends that in Eq (3). For the special case of M = v t −1, in which there is no restriction on the maximum number of parents, MNDðG t ; v t À 1Þ is reduced to NDðG t Þ. In the empirical study, we take M = 13, since the computation will be very slow when we choose a larger M. We will see how Eq (4) can be used to quantify systemic risk from the connectedness of the stocks in the empirical study.
Topological orders. Bayesian networks are also known as causal networks because the network configuration may give us information on the "cause-and-effect" interpretation from a probabilistic perspective. For example, the factorization in Eq (2) for the BN in Fig 2 example gives us an idea of which variables are the "causes" and/or "effects" of the other variables. In the joint distribution in Eq (2), X 1 appears in P(X 1 ), P(X 3 |X 1 , X 2 ), and P(X 4 |X 3 , X 1 ). We can interpret that X 1 as a "cause" of X 3 but not vice versa because X 1 appears in the conditional distribution of X 3 as a given variable in the factorization. Similarly, X 1 and X 3 can be regarded as "causes" of X 4 according to the network structure in Fig 2 example. If we take the potential cause-and-effect structure into consideration, we can say a "cause" variable is lower than an "effect" variable. We can then talk about topological order as a topological feature of a BN.
A topological order of a BN defined in [29] refers to the ranking of the variables in the network. In a Bayesian network, if X i precedes X j in the topological order, then an edge from X j to X i is not allowed or X j is not a parent of X i . For example, in Fig 2 example, one possible topological order is (X 2 , X 1 , X 3 , X 4 ) according to the rule stated above. Let T(X i ) be the position of X i in a topological order. Then, the topological order (X 2 , X 1 , X 3 , X 4 ) in Fig 2 example corresponds to T(X 1 ) = 2, T(X 2 ) = 1, T(X 3 ) = 3 and T(X 4 ) = 4. The topological order (X 1 , X 2 , X 3 , X 4 ) is also allowed since neither X 1 nor X 2 is a parent of each other. For this topological order, we have T(X 1 ) = 1, T(X 2 ) = 2, T(X 3 ) = 3 and T(X 4 ) = 4. As there are multiple topological orders corresponding to a given BN, a fairer way to rank the variables is to take the median of all possible topological orders. It can be very computationally intensive to list out all possible topological orders when the network size (i.e., the number of nodes v t ) is large. Instead of identifying all topological orders of a Bayesian network at time t, G t , an alternative method is to sample a subset of topological orders from O G t , the space of all possible topological orders corresponding to G t . Then, we can estimate M t ðX i Þ ¼ medianfT t ðX i Þ : ðT t ðX 1 Þ; :::; T t ðX v t ÞÞ 2 O G t g using the sampled topological orders. We use the Kahn's algorithm [47] to help generate samples from O G t . The Kahn's algorithm is a node sorting algorithm. We need to input an initial topological order of the nodes into this algorithm, then the algorithm will start to sort the nodes based on the initial topological order. For each initial topological order, the algorithm returns a topological order that is compatible with G t . To generate samples of topological orders from O G t , we can input different randomly generated initial topological orders to the algorithm.
There are v t ! ways to input the initial topological orders. In the empirical study, we will randomly pick K = 100 initial topological orders to input into the algorithm for obtaining samples of topological orders for each G t and to estimate M t (X i ) using the samples.
To visualize and compare the topological order of the nodes on different days, since the numbers of nodes can change over time, the range of M t (X i ) can be different. To facilitate the comparison, a relative topological order of the network G t is defined: Note that, if M t (X i ) = 1 (i.e., at the head of the topological order), then RM t ( , at the tail of the topological order), then RM t (X i ) = 0. RM t (X i ) always lies between 0 and 1, and a larger RM t (X i ) indicates that X i is relatively near the head of the topological order. As in M t (X i ), we can estimate RM t (X i ) by using sampled topological orders from O G t .
In financial modeling, this relative topological order allows us to quantify the importance of each stock in a Bayesian network at different time points. If a stock is topologically ordered near the front, it can be regarded as the "cause" of the changes in other stocks in the Bayesian network, or as part of the sources of market fluctuation. On the other hand, if a stock is topologically ordered near the tail, the stock's influence on other stocks' fluctuation may be relatively smaller. In short, RM t (X i ) helps measure the relative importance of the stocks in a BN over time. If RM t (X i ) changes substantially in a short amount of time, it may indicate unexpected changes in the market fluctuation, or rapid increase of systemic risk in financial markets.
Order distance. To trace the stability of financial markets over time, we propose measuring the changes in the topological orders. The rationale behind this is that the network topology is likely to be stable when the market is "quiet" or the volatility of the market is low. Conversely, the network topology can change substantially when there are severe adverse changes in the market. Therefore, being able to quantify changes in the topological order can help assess financial risk, especially for events leading to extreme market movement, which triggers systemic risk in financial markets. Let V t be the set of nodes in the Bayesian network at time t. Consider the vector of median topological orders, fM t ðX i Þg i¼1;:::;v t , on two consecutive days: day t−1 and day t for the stock X i . Since V t and V t−1 may have different nodes, we only consider the nodes in V t−1 \V t to define a topological order distance on day t. Note that the scale of M t (X i ) and M t−1 (X i ) can also be different, since M t (X i ) takes values from 1 to v t . Following the idea of setting portfolio weights in asset allocation, we can define a vector of normalized topological orders on day t with respect to the set V s as By construction, all elements in NT t (V s ) lie between 0 and 1 and The elements in the vector in Eq (6) are analogous to the portfolio weights for investing in the stocks represented by the nodes in V s , putting more weights on the stocks near the head of the topological orders.
The order distance between two consecutive networks on trading day t−1 and t is defined as where k�k 1 is the L 1 -norm. We use of L 1 -norm to define the order distance because in NT t (�), v t − M t (X i ) is normalized by the sum of its elements (which are all positive). This normalization can be regarded as a L 1 -normalization. If we choose L p -norms (p > 1), the corresponding order distance will tend to decrease when the dimension of NT t (�) increases (which is undesirable for a regression predictor).
We use the order distance because we believe that there is a relationship between a high level of volatility and a rapid change in network connectedness, since a volatile market is often unstable. Specifically, we posit that the association between the volatility and order distance is more likely to be positive whenever the volatility is high. We provide empirical evidence of this phenomenon in S3 Appendix in S1 File. We fit moving-window regressions using the absolute daily returns of the Hang Seng Index, a proxy of the volatility, as the responses, and the estimated order distances in the previous trading day as the predictor. The modified network density and the lagged absolute returns are also included as control variables. We obtained the estimates of the coefficients of the order distances on each of the trading days over the study period. We partition the log-transformed absolute returns (log-transformation is conducted because we want to ensure the results can be easily visualized), and count the proportion of positive coefficients in each of the intervals. We found that the proportions are, on average, larger whenever the absolute returns are larger. The same analysis is conducted using the Dow Jones Industrial Average Index; the proportions show a similar phenomenon. The order distance is thus useful for predicting volatility in a period with high levels of volatility, and a high level of volatility is often accompanied by a large order distance. Based on this phenomenon, we will focus on extreme scenarios in this paper. We provide a more detailed discussion of these extreme scenarios in the LASSO Regression Prediction section later in this paper. More details of the above empirical evidence can be found in S3 Appendix in S1 File.

Rolling-window Bayesian network modeling
Data. To demonstrate our proposed Bayesian network method in assessing systemic risk using topological features of the network, we obtain the closing prices of the HSI's constituent stocks [48] in the Hong Kong stock market and the closing prices of HSI from 2 January 2008 to 4 November 2021 (3,404 trading days). We use the HSI because the Hong Kong Stock Exchange is one of the most representative exchanges in the world [49]. We believe that the Hong Kong stock market is a useful example for studying systemic risk. Moreover, the constituent companies in the HSI represent about 65% of the capitalization of the Hong Kong Stock Exchange [50]. It is important to study the systemic risk of the constituent stocks in the HSI. A time series plot of the HSI is given in Fig 7. The constituent names, stock symbols and their sectors are listed in the S1 Appendix in S1 File. Let P it and R it = logP it − logP i(t−1) be the closing price and log-return, respectively, of the i-th stock on day t, where day 0 refers to 2 January 2008. The data on 2 January 2008 are used to calculate the log-return on 3 January 2008, so that there are altogether T = 3403 returns in our investigation. Since the composition of the HSI is reviewed regularly, both the numbers of constituent stocks and the list of constituent stocks change over time. In the period of the empirical study, the number of the constituent stocks increases gradually from 46 to 60. The construction of the time series of Bayesian networks is based on the HSI's constituent stocks. Therefore, the number of nodes v t ranges from 46 to 60 in the empirical study.
In this paper, we focus on dichotomous return variables in the Bayesian network construction though, in general, our methodology can also be applicable to continuous variables. From a practical perspective, it is also easier to model Bayesian networks using discrete data than continuous data. Specifically, we define the return indicator of the i-th stock on day t by Structural learning. We adopt a rolling-window approach and use the return indicators X it to construct a time series of Bayesian networks. Let D t be the set of return indicator variables of the HSI's constituent stocks on day t. For each day t, we use the data sets D (t−w+1):t = {D t−w+1 , . . ., D t } to construct G t , the BN on day t, where w is referred to as the rolling-window size. Since the number of variables in D t can be different on different day t, the network G t only includes variables that are common in all data sets in D (t−w+1):t . We employ the Markov chain Monte Carlo (MCMC) structural learning algorithms, including the Structure MCMC algorithm [40] and the Order MCMC algorithm [41] to estimate G t by obtaining the network that maximizes the Bayesian Dirichlet equivalent uniform (BDeu) score [29] using the data set D (t−w+1):t .
Structural learning refers to a task for learning the network structure of a set of variables from the data. There are mainly two classes of structural learning algorithm: score-based algorithms, which maximizes an objective function, and constraint-based algorithms, which tests for conditional independence structures in the data. MCMC is a score-based algorithm, allowing us to sample from the target distribution, which is difficult to generate directly. The most traditional MCMC structural learning algorithm is the Structure MCMC algorithm proposed by [40], conducts MCMC sampling by adding, deleting, or reversing an edge in the network. Specifically, on a day t, let G and G 0 be two structures of a Bayesian network, where G 0 is obtained from G by adding, deleting, or reversing an edge. The Structure MCMC is a Metropolis-Hastings algorithm with an acceptance ratio, in our case, PðG 0 jD ðtÀ wþ1Þ:t ÞqðGjG 0 Þ PðGjD ðtÀ wþ1Þ:t ÞqðG 0 jGÞ where PðGjD ðtÀ wþ1Þ:t Þ is the BDeu score for the structure G using the data set D (t−w+1):t , and qðG 0 jGÞ is the transition kernel from the structure G to G 0 . We take the structure with the highest BDeu score in the Structure MCMC as G t . Another algorithm we used in the study is the Order MCMC algorithm proposed by [41], which conducts MCMC sampling on the topological order space. The procedures of the Order MCMC algorithm are similar to those in the Structure MCMC algorithm, except we need to modify the BDeu score. We combine both algorithms to enhance the efficiency in terms of searching networks with high BDeu scores. The main advantage of using MCMC algorithms in our study is that it enables us to avoid becoming tapped in local maximum [51], making it more efficient to learn networks compared to some other optimization methods, such as greedy hill-climbing. This feature is especially important when it is necessary to learn many networks across thousands of trading days, as in our rolling-window modeling.
The maximum number of parents for each node, M, is set to 13. We choose M = 13 because the computation will be very slow if we were to use M > 13. Since we need w days of data to learn the structures, we can only obtain a time series of networks fG w ; G wþ1 ; . . . ; G T g. The workflow of the construction of the time series of networks is shown in the timeline in Fig 3a. The first estimated structure is on day w using the data set D 1:w . The next structure to be estimated is on day w + 1 using the data set D 2:(w+1) , and so on.
Granger-causality tests. An objective of this paper is to apply the network statistics of Bayesian networks in financial risk management. We investigate if the modified network density in Eq (4) and the order distance in Eq (7) are helpful in predicting future losses and risks in the market. We use the HSI returns as a proxy of market losses. Let R t be the return of HSI on day t and define the loss on day t by Loss t = −min{R t , 0}. Following [17], we use a recent sample of observations to study any lead-lag relationships between historical network statistics and the loss on day t. In the empirical data analysis, we include observations from day t − w GC + 1 to day t to conduct the Granger-causality tests by fitting the following regression models, where w GC = 40 is the sample size:

PLOS ONE
where a i , b j , c k and d ℓ are regression coefficients, ε t 's are random errors in the regression models, and L is the number of lags included in the Granger-causality tests. The H 0 represents the null model, which contains lagged losses as the only independent variables. The H 1a and H 1b respectively test whether the lagged modified network density and the order distance are helpful for the prediction of the next-day loss, and H 2 includes both independent variables and their interaction terms. The Granger-causality tests in Eq (9) are conducted for L = 1, 2, 3, 4, 5.
Note that, the sample size w GC is always fixed. Fig 4 shows the use of data from day t−w GC + 1 to t to perform one Granger-causality test. When we use a lag L, we use the first L observations as lagged values. Then, including the data on the current day, we use L + 1 days of values for each hypothesis in Eq (9). Thus, effectively, we have w GC −L rows of data to input into each Granger-causality test. Similarly, we also conduct another set of Granger-causality tests in regard to the usefulness of the network statistics in predicting the volatility. A high level of volatility is associated with a high level of instability in financial markets and, thus, being able to predict future volatility movement can help predict market risk. The increase in volatility to a very high level can be associated with potential systemic risks in financial markets. A common proxy of volatility is the absolute return [44], and the corresponding null hypothesis is with similar alternative hypotheses as in Eq (9), replacing Loss t−i with |R t−i |, i = 0, 1, . . ., L. The workflow of the Granger-causality tests is summarized in Fig 3b and 3c. In Fig 3b, the first order distance that can be calculated is on day w + 1 since we need both NT w (V w ) and NT w+1 (V w ) to calculate OD w+1 according to Eq (7), and the first available normalized topological order is NT w (V w ) from the first network G w . The next order distance OD w+2 is calculated from NT w+1 (V w+1 ) and NT w+2 (V w+1 ), and so on. Then, the first day on which we can conduct a Granger-causality test is on day w + w GC , using the risk indicators from day w + 1, the first day an order distance is available, to day w + w GC , leading to, altogether, w GC days of observations, as shown in Fig 3c. The next test is conducted on day w + w GC + 1, and so on. There are in total T − (w + w GC ) + 1 days on which we conduct the Granger-causality tests. LASSO regression prediction. In addition to the Granger-causality test, to investigate any significant leading effects of the modified network density and the order distance on the losses or absolute returns, we also evaluate the predictive performance using the models under four hypotheses in Eq (9) using the loss as the response, and that using the absolute return as response in Eq (10). A better prediction of volatility, as a trigger and a cause of financial crises, allows us to detect systemic risk events more accurately. On each day t, the most recent m = 40 days of observations (i.e., observations from day t − m + 1 to t) will be used to fit LASSO regression models for one-day predictions using the models under different hypotheses in

PLOS ONE
Eq (9) and Eq (10). LASSO regressions may help reduce the variance and improve the prediction by shrinking some regression coefficients to zero [52], due to the penalty term. We will fix L = 5 for the models in H 1a and H 1b in the prediction but, unlike in Granger-causality tests, L = 1, 2, 3, 4 are not included because the LASSO regression will automatically shrink the coefficients to zero if some lags are irrelevant. For the models in H 2 , we pick a smaller number of lags L = 3, since this model contains more parameters than the models in H 1a and H 1b . We need to set a smaller L, otherwise the degrees of freedom in the model will be too small, which may lead to overfitting. Similar to the mechanism within a Granger-causality test, we show the mechanism of fitting a LASSO regression model on a day t in Fig 5. The first L observations are used as lagged values. Including the observation on the current day, there are L + 1 days of values for each equation in Eq (9) and Eq (10). Thus, effectively, we have m−L rows of data to fit each LASSO regression model.
Let y t be any one of the Loss t or R t , as in the models in Eq (9) or Eq (10), respectively. We let y t (1;H i ) be the one-day-ahead prediction of y t+1 using the model under the hypothesis H i , i 2 {0, 1a, 1b, 2}, given the information up to time t. The root-mean-square error (RMSE) is used to evaluate the predictive performances The workflow of the regression prediction is summarized in Fig 3d. The first fitted model is on day w + m, using the data from day w + 1, the first day an order distance is available, to day w + m. We then conduct a one-day prediction for day w + m + 1. The next fitted model is on day w + m + 1, using the data from day w + 2 to w + m + 1. We then conduct a one-day prediction for day w + m + 2, and so on. The last prediction is on day T. Thus, there are a total of T − (w + m) predictions, as in the denominator of Eq (11).
On top of evaluating the overall performance, it is also meaningful to look into the predictive performance under tail scenarios (or extreme scenarios), which are more crucial to systemic risk. The following methodology (tail-event strategy) is specifically designed for developing a predictive strategy, where we believe that the network statistics based on the time series of Bayesian networks can be more effective in predicting market risk under tail scenarios. We propose using the predicted values of the loss or absolute return only when the observed loss or absolute return one day before the prediction is large, which may indicate the occurrence of tail scenarios in the next few days. In other words, we develop the following prediction scheme where we selectively use the predicted values generated by the network statistics to focus on tail scenarios, or to infer the systemic risk of the market. On each day t, we accept the prediction whenever the observed value, y t , is larger than an m 0 -day rolling-window p-th percentile q t (p) (i.e., the p-th percentile of {y t , y t−1 , . . ., y max{1,t−m 0 +1} }). Otherwise, we do not include the predicted value in the RMSE assessment. Let Q(p) = {t:y t > q t (p), t = w + m, . . ., T−1} be the set of time indexes when the predicted value y t (1, H i ) is included in the RMSE Fig 5. The data included when fitting a LASSO regression model on day t, where m = 40, L = 5 for models under  H 1a and H 1b , and L = 3 for models under H 2 . d t represents the data available on day t. https://doi.org/10.1371/journal.pone.0279888.g005 PLOS ONE assessment, as in (Eq (11)). We also use the RMSE to evaluate the predictive performance of the tail-event strategy. The RMSE only assesses the prediction on the days with large observed losses or absolute returns, the time indexes of which are in the set Q(p). The RMSE is then defined as RMSE p ðH i Þ ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi 1 QðpÞ Both RMSE(H i ) and RMSE p (H i ) are used to assess the predictive performance of the modified network density and the order distance in regard to the loss and absolute return.

Visualization of financial networks
It is more efficient to explore the data across five sectors, classified according to the subindexes listed in [48]. The five sectors are: commerce (29 stocks), finance (11 stocks), information technology (four stocks), properties (12 stocks), and utilities (four stocks). The detailed description of these stocks are shown in the S1 Appendix in S1 File. The networks are learned using the window size w = 30 and the T = 3403 trading days of return indicators in Eq (8). The networks on three selected dates are shown in Fig 6. These are three example networks showing the importance of the order distance. Fig 6(a) shows the network on 27 October 2008 during the global financial crisis due to subprime mortgage problems [6]. Fig 6(b) shows the network on 26 January 2018, the day on which the HSI has the highest price ever. Fig 6(c) shows the network on 23 March 2020 during the COVID-19 pandemic. The HSI was at its lowest price of the year on this day in 2020. The connectedness of the networks is similar across sectors. The size of the nodes is proportional to the 60-day moving average of the relative topological order in Eq (5). The 60-day moving average is calculated to enable us to clearly visualize the trend of the relative topological order. Fig 6 illustrates the behavior of the relative topological order on abnormal days (during the financial crisis of 2008 and the COVID-19 pandemic) and on a normal day. During the financial turmoil taking place on the days in Fig 6(a) and 6(c), the topological orders of the stocks are quite imbalanced; some nodes are very large while some are very small, showing that some stocks are more influential than the others. Fig 6(b) shows a network on 26 January 2018, within a period in which there was no particular extreme financial event. The 60-day moving average of the relative topological order of the stocks appears to be much more uniformly distributed, with similar node sizes. Fig 6 shows that the topological order of stocks is more stable during a normal period than in periods when high levels of financial uncertainty are anticipated. We believe that the relative topological order of stocks is associated with the stability of the market. Changes in the relative topological order can help identify adverse changes in the financial market. Therefore, we hypothesize that the topological order distance in Eq (7) can help track systemic risk when using Granger-causality tests and predictive analysis.

Summary statistics of the financial networks
To summarize the networks in the period under study, Fig 7 shows the time series plot of the 60-day moving average of the relative topological order in Eq (5), with the stocks grouped by sector, the HSI closing prices, and the number of variables (or number of HSI's constituent stocks) in the networks. In the first panel in Fig 7, the finance sector has a high relative topological order, except that the relative topological order of the utilities sector has increased above that of the finance sector during the COVID-19 pandemic. The topological order of the sectors changes over time, indicating that the market is led by different sectors at different periods. The topological orders change more rapidly in some periods; for example, during the COVID-19 pandemic in 2020, and in 2022, when the market was unstable (highlighted by the red rectangle marked with (a) in the first panel in Fig 7). The utilities sector is bumped up in the topological order, whereas the finance sector moves down. The lines representing the finance sector (green solid line) and the properties sector (pink dashed line) in the first panel in Fig 7 intersect several times during 2020 and 2022, indicating that the topological orders of

PLOS ONE
these two sectors change rapidly. In contrast, the topological orders of the sectors from 2016 to 2018 (highlighted by the red rectangle marked with (b) in the first panel in Fig 7) are more stable, aside from the information technology sector, which contains only a few stocks. The finance sector (green solid line) has a high topological order during the period from 2016 to 2018, followed by the utilities sector (light green dotted line), the properties sector (pink dashed line), and the commerce sector (blue dashed line). Their topological orders stay at similar levels in this period, with only a few intersections. These examples show that the change in topological orders may be associated with the stability of the market. We quantify the degree of fluctuation of the relative topological order using the order distance in Eq (7), and conduct Granger-causality tests to predict and analyze the relationship between the topological orders and the systemic risk. The numbers of constituents in the HSI increases from 46 to 60, as shown in the fourth panel in Fig 7.  Fig 8 shows the time series plots of the loss and absolute return of the HSI in the first two panels, and the order distance and modified network density of the networks in the third and

PLOS ONE
fourth panels. Note that these statistics are not smoothed; they are directly used in the Granger-causality tests and rolling-window prediction study directly. In the first two panels, the loss and absolute return show some clustering patterns. We aim to predict the periods with extreme losses and absolute returns in order to establish early warning signals of possible systemic risk. In the third and fourth panels, the order distance and modified network density also show some clustering patterns, showing that the loss and absolute return may be associated with the order distance and the modified network density. Below, we evaluate the association, also the usefulness and performance of the order distance and modified network density in predicting the loss and absolute return, as two proxies of systemic risk below.

Granger-causality tests
The Granger-causality tests are conducted using the window size w GC = 40, from day w + w GC to day T (from 16 April 2008 to 04 November 2021), with a total of T − (w + w GC ) + 1 = 3403 − 70 + 1 = 3334 trading days, as shown in the timeline in Fig 3c. Table 1 of day granger significant shows the numbers of trading days out of total 3,334 days that have at least one Granger-  Table 1. The number of trading days (out of total 3334 days) that have at least one Granger-causality test (from lag 1 to lag 5) with a significant result at 0.1 level. The rows are the responses and the columns are the hypotheses tested in the Granger causality tests stated in Eq (9) and Eq (10).

PLOS ONE
causality test, from lag 1 to lag 5, with significance at 0.1 level. The numbers range from 682 to 1,108, which correspond to around 20% to 33% of the trading days, showing that there is a high proportion of trading days on which the modified network density and the order distance "cause" the market loss or volatility in the Granger-causality sense. We find more significant results in regard to the loss than in regard to the absolute return. The results suggest that the order distance and modified network density are useful in predicting the absolute return and loss, and the evidence for this finding seems to be stronger when the target variable is the loss. Fig 9 shows the time series plots of the loss and absolute return, with blue dots denoting the trading days on which the loss and absolute return have significant results in the Granger-causality tests. The columns denote the variables used, from left to right: modified network density only (H 1a ), order distance only (H 1b ), and both (H 2 ). There are more significant results detected after 2015. We can also observe from the figure that test results tend to be significant when there are big losses or high absolute returns. We explore the predictive power of extreme losses and extreme absolute returns below, using the RMSE in Eq (12).

Rolling-window prediction study
A LASSO regression prediction study is conducted with a sample size of m = 40, on each day from day w + m + 1 to day T (from 16 April 2008 to 03 November 2021), with a total of T − (w + m) = 3403 − (30 + 40) = 3333 days of predictions as shown in Fig 3d. The parameters in the LASSO regression models are tuned using time series cross-validation. The time series of the responses (loss and absolute return) and the independent variables (order distance and modified network density) are plotted in Fig 8. These variables are used to fit the two models in Eq (9) and Eq (10) under different hypotheses. One-day predictions are carried out for the different hypotheses of the two models on each day. The RMSEs of the prediction using a tail-event strategy with m 0 = 100 are reported in Table 2. The first three parts labeled with

PLOS ONE
"Extreme 0.1%","Extreme 1%", and "Extreme 2%" correspond to the prediction under the tail scenarios, according to the strategy in Eq (12) by setting p = 99.9, 99, and 98, respectively. The last part labeled "all cases" shows the RMSEs calculated using Eq (11), if we conduct predictions for all trading days. The columns denote the models used according to the different hypotheses in Eq (9) and Eq (10). According to the tail-event strategy, the RMSEs of the absolute return are reduced by around 1.9% to 7% with different p in Eq (12) if we include only the order distance or include both the order distance and the network density as independent variables (the last two columns in Table 2, marked H 1b and H 2 ). Note that, although using the modified network density alone (H 1a ) does not improve the prediction, including the order distance only (H 1b ) and including both the order distance and modified network density (H 2 ) can both help to improve the predictive performance. Without using the tailevent strategy (the last part marked with "all cases" in Table 2), the models do not perform well. These results suggest that the order distance and modified network density are more useful for extreme cases. This phenomenon is consistent with the design of the tail-event strategy. Table 3 shows the number of days for which we actually accept the predictions with different p in Eq (12). There are 37, 69, and 116 days that we accept the predictions of the absolute returns for extreme 0.1%, 1%, and 2% scenarios respectively, which range from around 1% to 3.5% of the trading days out of the 3,333 days. Note that these numbers are the same across all H 1a , H 1b , and H 2 for a fixed p, since the set Q(p) depends only on p and the observed data, but not on the predicted values.

Discussion
Volatility is can be considered as a trigger and a cause of a financial crisis, and it can therefore serve as a precautionary signal for systemic risks. The idea underlying the use of Bayesian networks to model the connectedness inside the financial market is that high levels of network connectedness can facilitate the transmission of market disturbance, thereby increasing the market volatility. By detecting the increase in network connectedness, we can be prepared before the market disturbance or changes spread, or the volatility increases substantially, signaling that high levels of systemic risks are evolving. An advantage of using Bayesian networks over undirected networks is that Bayesian networks allow stocks or financial assets to be ordered topologically.
We can see in Fig 6(a) and 6(c) that the relative topological orders of the stocks are quite imbalanced when there is financial turmoil. This can be explained by the cascading failures in highly connected financial networks. These failures are characterized by higher topological orders in a few nodes, such that any financial disturbance can flow to other stocks with lower topological orders, and eventually lead to the failure of the whole financial system. Although we find that the topological orders can be used to predict high levels of volatility, further research is needed in regard to the relationship between the change in topological orders and the flow of financial disturbance.
In Fig 9, we observe that the Granger-causality tests give significant results more often in the period from 2015 to 2021, a period with higher levels of network connectedness (as shown in the third panel in Fig 8), than in the period from 2008 to 2014. This phenomenon is consistent with the finding in [53] that higher levels of connectedness in the banking network within a country are associated with a higher probability of financial crises. More specifically, we can show that the two risk indicators (the modified network density and the order distances) are useful in detecting high levels of volatility, and also have good predictive performances for extreme cases. The Granger-causality tests using two indicators to detect high levels volatility yield significant results more often in the periods with higher levels of network connectedness, which are associated with a higher probability of financial crises. Moreover, our empirical study includes not only banks but also other financial sectors, showing that a similar phenomenon that occurs for banking networks, discussed in [53], also applies to a more general financial network.
The tail-event strategy used in the prediction study is specifically designed for predicting extreme events only. This prediction strategy is slightly atypical, but is useful in practice in this paper, in regard to the reduction in prediction RMSEs in Table 2. In practice, forecasting financial time series is a challenging task. A modest improvement in prediction would be beneficial for the practitioners [54]. The reduction in RMSEs in the predictions supports the idea that the modified network density and order distance can provide additional information to the predictive models. The use of modified network density and order distances is not limited to the LASSO prediction model we introduced. For example, we can include these two risk indicators into volatility models with regressors [55,56].
Whereas the current results showed that the risk indicators help improve the predictability of the tail events, we still need to take into account the robustness of the methodology. We provide sensitivity analyses of M (the maximum number of parents for each node in the structural learning), K (the number of random samples used in the estimation of the median topological order), w (the number of days of data used in the structural learning), m (the number of days of data used to fit the LASSO regression prediction model), and m 0 (the number of days of data included in the m 0 -day rolling-window p-th percentile) in the S4 to S7. Appendices in S1 File. Although the restriction on M is mainly due to the computational burden, it can affect the results. We show that the predictive performance is the best when taking M = 13 in the S4 Appendix in S1 File. We show that increasing the parameter K from 100 to a larger value can only provide a tiny benefit in terms of the accuracy of the estimation of the normalized topological order in the S5 Appendix in S1 File. Hence, the choice of K depends on the time available in practice. Even when we increase K from 100 to 1000, the RMSEs of the estimation of the normalized topological orders are only reduced from around 0.0017 to 0.0013, while the running time is increased by around 11 times, which is very inefficient. Moreover, the improvement is very small compared to the magnitude of the normalized topological order. The choices of the parameters m and w, however, are more restrictive; a longer window may include too much irrelevant information, while a shorter window will lead to unstable parameter estimation problems. We have shown that the choice of w = m = 30 is still good in terms of predictive performances (specifically, the prediction RMSEs show some improvements for the model under H 1b and H 2 for the extreme 0.1% and 1%), while the choice w = m = 60 results in bad predictive performances in the S6 Appendix in S1 File, which show no improvement in all cases. The choice of w = 30 and m = 40 actually performs very well, as seen in the main result of this paper. In contrast, we show that the models perform well in terms of predictive RMSEs when the parameter m 0 is respectively taken to be 40 and 60 in the S7 Appendix in S1 File (specifically, all the prediction RMSEs under the model H 1b and H 2 are reduced for the extreme 0.1%, 1%, and 2%). The choice of m 0 is then dependent on the expert knowledge of the person using the model, in regard to the horizon of interest. The physical meaning of m 0 is that, we want to filter out an extreme day in a window of width m 0 . The proposed order distance indeed helps reduce the predictive performances in many different settings of the parameters. The sensitivity analysis shows a limitation of this study that the choice of parameters may affect the results. A possible way to enhance the robustness of the model is that instead of fixing the parameters, we can tune the parameters over time. One of the main purposes of this paper is to show that the order distances contain useful information for predicting volatility, and the LASSO prediction model is only one of the prediction models chosen to illustrate this. The order distances can be used as predictors in other models. Further research is needed to understand the behavior and predictive performance of the order distances in different models.
While we have shown that the methodology helps improve the predictive performance of the HSI, which is a representative index [49], we still need to test if the methodology works for other stock markets. We repeat the same modeling procedure using the Dow Jones Industrial Average Index in the S8 Appendix in S1 File. The closing prices of the DJIA Index and its constituent stocks from 19 September 2017 to 16 September 2022 (1,258 trading days) are collected. We conduct the same procedures in the study using the HSI in the main results. All settings for the parameters are also the same, except we use M = 7 and M = 13 for a sensitivity analysis on M. The RMSEs of the LASSO regression predictions using the equations in Eq (9) and Eq (10) with M = 7 and M = 13 are shown respectively in Table 11a and 11b S8 Appendix in the S1 File, and the RMSEs are reduced under models H 1a and H 2 when we set M = 7, and the RMSEs are also reduced under model H 1b when we set M = 13. The methodology also improves the predictive performance for the DJIA Index in terms of prediction RMSEs. Hence, we believe that the methodology could be applied to different stock markets.
To conclude, we illustrate the usefulness of the order distance in terms of predicting the volatility using two sets of data: the HSI and DJIA Index. We provide a more detailed analysis on the HSI data in the Main Results section in this paper. The DJIA Index is also studied to demonstrate that the order distance also helps improve the prediction performance in another stock market. We show that there is a larger portion of days on which the lagged order distances and the modified network densities have significant Granger-causality in regard to the two proxies of the volatility, the absolute returns, and the loss, defined as the magnitude of the negative part of the returns. These two risk indicators, the order distance and the modified network density greatly improve the predictability of the absolute returns, which allows us to predict volatility more accurately, and thus prepare ourselves for possible systemic risk events. Note that, the order distance we proposed can only be measured using directed networks, which demonstrates the importance of the use of directed networks. The use of the order distance is not limited to the LASSO regression prediction model; we can include the order distance as a predictor in any volatility model to improve the predictability of tail events. In the future, we can continue to develop the application of the directed networks and the Bayesian networks in regard to the financial modeling and the systemic risk management, as the order distance is not the only information we can acquire from the Bayesian networks.